Conversation
Documentation build overview
Show files changed (4 files in total): 📝 4 modified | ➕ 0 added | ➖ 0 deleted
|
Co-authored-by: Gregory P. Smith <greg@krypto.org> Co-authored-by: Donghee Na <donghee.na@python.org> Co-authored-by: devdanzin <74280297+devdanzin@users.noreply.github.com>
d2de5dc to
726ec3c
Compare
Co-authored-by: Jacob Coffee <jacob@z7x.org>
…delines. Add the Guidelines to the contributing table.
savannahostrowski
left a comment
There was a problem hiding this comment.
Thank you for doing this, @Mariatta!
My comments are mainly about extending the guidance to cover issues as well. While AI tooling can be great at surfacing real bugs and security issues, I think it's still important that those filing issues understand the problem themselves so we can keep discussions focused and productive.
| Considerations for success | ||
| ========================== | ||
|
|
||
| Authors must review the work done by AI tooling in detail to ensure it actually makes sense before proposing it as a PR. |
There was a problem hiding this comment.
| Authors must review the work done by AI tooling in detail to ensure it actually makes sense before proposing it as a PR. | |
| Authors must review the work done by AI tooling in detail to ensure it actually makes sense before proposing it as a PR or filing it as an issue. |
|
|
||
| Authors must review the work done by AI tooling in detail to ensure it actually makes sense before proposing it as a PR. | ||
|
|
||
| We expect PR authors to be able to explain their proposed changes in their own words. |
There was a problem hiding this comment.
| We expect PR authors to be able to explain their proposed changes in their own words. | |
| We expect PR authors and those filing issues to be able to explain their proposed changes in their own words. |
| Disclosure of the use of AI tools in the PR description is appreciated, while not required. Be prepared to explain how | ||
| the tool was used and what changes it made. |
There was a problem hiding this comment.
| Disclosure of the use of AI tools in the PR description is appreciated, while not required. Be prepared to explain how | |
| the tool was used and what changes it made. | |
| Disclosure of the use of AI tools in the PR description is appreciated, while not required. Be prepared to explain how the tool was used and what changes it made. |
Looks like some funky line breaking?
There was a problem hiding this comment.
I had it to break after 120 characters.
But now that I read the devguide's Rst markup doc, seems like we're supposed to break at 80 characters.
https://devguide.python.org/documentation/markup/#use-of-whitespace
| the responsibility of the contributor. We value good code, concise accurate documentation, and avoiding unneeded code | ||
| churn. Discretion, good judgment, and critical thinking are the foundation of all good contributions, regardless of the | ||
| tools used in their creation. | ||
| Generative AI tools are evolving rapidly, and their work can be helpful. As with using any tool, the resulting |
There was a problem hiding this comment.
It wasn't done before in this file for some reason, but could we please wrap lines?
There was a problem hiding this comment.
I was going to say the opposite :)
The rewrap make it hard to review what has changed. Please can we keep a minimal diff for now, and only rewrap just before merge?
There was a problem hiding this comment.
Just before merge sounds good to me :-)
| Sometimes AI assisted tools make failing unit tests pass by altering or bypassing the tests rather than addressing the | ||
| underlying problem in the code. Such changes do not represent a real fix and are not acceptable. |
There was a problem hiding this comment.
I'd like to see this worded in more general terms rather than using such a specific example (older models did this a lot more than 2026's). What this is really getting at is that we want people to be cautious about reward hacking rather than addressing the actual underlying problem in a backwards compatible manner.
maybe something along the lines of:
"Some models have had a tendency of reward hacking by making incorrect changes to fix their limited context view of the problem at hand rather than focusing on what is correct. Including altering or bypassing existing tests. Such changes do not represent a real fix and are not acceptable."
| - Consider whether the change is necessary | ||
| - Make minimal, focused changes | ||
| - Follow existing coding style and patterns | ||
| - Write tests that exercise the change |
There was a problem hiding this comment.
Should we add another bullet point along the lines of:
" - Keep backwards compatibility with prior releases in mind. Existing tests may be ensuring specific API behaviors are maintained."
perhaps a follow paragraph after this list:
"Pay close attention to your AI's testing behavior. Have conversations with your AI model about the appropriateness of changes given these principles before you propose them."
|
I would like text added to emphasize the dangers of AI assistants including work derived from training data, potentially violating the originals' copyrights and/or liecnsing terms. Core devs don't need that pointed out, but we have contributors of many backgrounds and experience levels. They're responsible for ensuring they have the legal right to grant the PSF permission to re-license their contributions, but explicit is better than implicit. Let's not assume "everyone knows" - everyone doesn't. |
Absolutely not. Such words are reactionary made up non-specific dangers with nothing concrete to back them up. Thus they have no place in the Python devguide or policies because they are not actionable. Contributor guidelines, the CLA, and license terms have already long covered this from a policy point of view. |
|
There are many examples from researchers of AI assistants duplicating training data verbatim, without attribution. blatantly violating copyright. How much more specific could it be? Newer users in particular are easily bamboozled by this, unware of the issues, and seduced by the supremely confident tone AI assistants adopt. I'm concerned about them and the project. The CLA doesn't even explicitly ask contributors to attest they have a legal right to license their contributions - that's all hiding behind the single word of legalese "valid". We haven't "long covered" this, because the intensified dangers of AI-produced code are a new development. A few years back, a new contributor opened a PR with code copied verbatim from glibc. How did we catch it? Dead easy: a comment in the code plainly said what followed was copied from glibc. They simply didn't know any better at the time. BTW, they want on to become a core dev. And they knew they were copying. How much more likely is someone to unwittingly contribute work that was copied by their AI assistant? How would they know? How would we? I'm not claiming we can "fix this". We can't. But we can - and in IMO should - alert contributors that the risks of contributing derivative works are surely intensified by the use of AI assistants. Not to dissuade them, but to help inform their decisions. Not a change in policy, but pro-active education. You may as well argue that all cautions about AI-produced code are redundant. For example, why encourage people to "Keep backwards compatibility with prior releases in mind"? That's always been policy too, |
|
BTW, Copilot assures me that provenance issues are the greatest danger projects face from use of AI tools. Being silent about that seems quite ill-advised. But it also tells me that few cases of AI-enabled copyright/licensing violations get any publicity. Organizations want to keep them quiet, and contributors who unwittingly submit tainted code are hardly likely to publicize it either. The chardet case is wildly atypical in every respect. |
|
Tim, I realize my response came off harsh. Sorry! I do care about this, I just want to keep this doc focused on actionable guidance for CPython contributions rather than general AI education, which it'll always be behind on. The reason I proposed a backwards-compat reminder but am pushing back on this one: backwards compatibility is something the core team actively evaluates on most every PR. Contributors often get it wrong, and it's a concrete thing they can guide their model to keep in mind. It's a problem we actually see, so nudging AI-using contributors to be proactive about it could have a clear payoff. A provenance warning doesn't have the same shape. We don't have a pattern of AI-laundered copyrighted code showing up in CPython PRs, and even if a contributor reads the warning and takes it seriously, what are they supposed to do? There's no reasonable verification step we can ask of them, and none we can perform either. A caution with no corresponding action just creates unease, and I don't think that earns space in these guidelines. The licensing obligation itself is real and already lives in the CLA. I'm also not well placed to debate licensing specifics in a public thread, so I'll leave that side of it alone. If the concern is that newer contributors don't understand what they're agreeing to in the CLA, that's worth raising with the PSF as a CLA question rather than something we patch with an AI-specific note here. My backwards-compat suggestion doesn't have to go in either, FWIW; that's Mariatta's and the docs reviewers' call. But I'd be sad to see a provenance warning land, as I think it'd detract from what's otherwise shaping up to be a refreshingly practical AI guidelines doc. For this PR I suggest we proceed with the other reviews and table the provenance discussion. It isn't something this PR can resolve. |
No description provided.